Natural Language Processing Using Neighbour Entropy-based Segmentation

نویسندگان

چکیده

In natural language processing (NLP) of Chinese hazard text collected in the process identification, word segmentation (CWS) is first step to extracting meaningful information from such semi-structured texts. This paper proposes a new neighbor entropy-based (NES) model for CWS. The considers benefits entropies, adopting concept "neighbor" optimization research. It defined by benefit ratio segmentation, including and losses combining unit with more than other popular statistical models. experiments performed, together maximum-based algorithm, NES achieves 99.3% precision, 98.7% recall, 99.0% f-measure segmentation; these performances are higher those existing tools based on seven Results show that valid CWS, especially requirements necessitating longer-sized characters. corpus used comes Beijing Municipal Administration Work Safety, which was recorded fourth quarter 2018.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmentation Standard for Chinese Natural Language Processing

This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by defining a segmentation unit to be equivalent to the theoretical definition of word, and by providing a set of segmentation principles that are equivalent to a functional...

متن کامل

A Maximum Entropy Approach to Natural Language Processing

The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihoo...

متن کامل

Statistical Natural Language Processing Method for Variant Texts Segmentation

It is well known that some techniques have already been developed to automatically subdivide texts into multiparagraph subtopic passages, such as TextTiling methodology proposed by Hearst. However, an additional algorithm is needed to perform a similar task for parallel or variant texts, because ambiguous and complicated traces of cross citation among them might often generate some sinuous patt...

متن کامل

Unsupervised Natural Language Processing Using Graph Models

In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevail, while machine learning algorithms use implicit knowledge for generating linguistic knowledge. The question behind this work is: how far can we go in NLP without assuming explicit or implicit linguistic knowledge? Ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computing and Information Technology

سال: 2022

ISSN: ['1846-3908', '1330-1136']

DOI: https://doi.org/10.20532/cit.2021.1005393